Three Dissimilarity Measures to Contrast Dendrograms
نویسندگان
چکیده
We discussed three dissimilarity measures between dendrograms defined over the same set, they are triples, partition, and cluster indices. All of them decompose the dendrograms into subsets. In the case of triples and partition indices, these subsets correspond to binary partitions containing some clusters, while in the cluster index, a novel dissimilarity method introduced in this paper, the subsets are exclusively clusters. In chemical applications, the dendrograms gather clusters that contain similarity information of the data set under study. Thereby, the cluster index is the most suitable dissimilarity measure between dendrograms resulting from chemical investigation. An application example of the three measures is shown to remark upon the advantages of the cluster index over the other two methods in similarity studies. Finally, the cluster index is used to measure the differences between five dendrograms obtained when applying five common hierarchical clustering algorithms on a database of 1000 molecules.
منابع مشابه
CAFE: aCcelerated Alignment-FrEe sequence analysis
Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software,...
متن کاملAdditive Evolutionary Treest . And
Metric trees are dendrograms which show the phylogenetic relationships for a set of contemporary species. These dendrograms have numerical values attached to the branches. If the sum of these values on the branches between any two contemporary species is equal to the dissimilarity between these two species, the metric tree is said to be additive and possess an additive dissimilarity matrix. Met...
متن کاملAssessing Dissimilarity Measures for Sample-Based Hierarchical Clustering of RNA Sequencing Data Using Plasmode Datasets
Sample- and gene-based hierarchical cluster analyses have been widely adopted as tools for exploring gene expression data in high-throughput experiments. Gene expression values (read counts) generated by RNA sequencing technology (RNA-seq) are discrete variables with special statistical properties, such as over-dispersion and right-skewness. Additionally, read counts are subject to technology a...
متن کاملSandrine Pavoine1,2*
2 Abstract. Ecological studies have now gone beyond measures of species turnover towards measures of phylogenetic and functional dissimilarity. This change of perspective has a main objective: disentangling the processes that drive species distributions from local to broad scales. A fundamental difference between phylogenetic and functional analyses is that phylogeny is intrinsically dependent ...
متن کاملCombining dissimilarity measures for prototype-based classification
Prototype-based classification, identifying representatives of the data and suitable measures of dissimilarity, has been used successfully for tasks where interpretability of the classification is key. In many practical problems, one object is represented by a collection of different subsets of features, that might require different dissimilarity measures. In this paper we present a technique f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of chemical information and modeling
دوره 47 3 شماره
صفحات -
تاریخ انتشار 2007